Improving Software Pipelining with Unroll-and-Jam
نویسندگان
چکیده
To take advantage of recent architectural improvements in micropr&essors, advanced compiler optimizations such as software pipelining have been developed [I, 2, 3, 41. Unfortunately, not all loops have enough parallelism in the innermost loop body to take advantage of all of the resources a machine provides. Unroll-and-jam is a transformation that can be used to increase the amount of parallelism in the innermost loop body by making better use of resources and limiting the effects of recurrences (5, 61. In this paper, we demonstrate how unroll-and-jam can significantly improve the initiation interval in a software-pipelined loop. Improvements in the initiation interval of greater than 40% are common, while dramatic improvements of a factor of 5 are possible.
منابع مشابه
Register Pressure Guided Unroll-and-Jam
Unroll-and-jam is an effective loop optimization that not only improves cache locality and instruction level parallelism (ILP) but also benefits other loop optimizations such as scalar replacement. However, unroll-and-jam increases register pressure, potentially resulting in performance degradation when the increase in register pressure causes register spilling. In this paper, we present a low ...
متن کاملUnroll-And-Jam Guided by A Linear-Algebra-Based Data-Reuse Model
Because of the existence of a memory bottleneck in modern microprocessors, idle computational cycles in pipelined multiple functional units slow down the program performance. One solution to this problem is applying loop unroll-and-jam to improve the ratio of memory operations to floating-point operations for loops according to the target machine optimal ratio. In doing so, both enough computat...
متن کاملOptimizing Sparse Matrix - Vector Product Computations Using Unroll and Jam
Large-scale scientific applications frequently compute sparse matrix vector products in their computational core. For this reason, techniques for computing sparse matrix vector products efficiently on modern architectures are important. This paper describes a strategy for improving the performance of sparse matrix vector product computations using a loop transformation known as unroll-and-jam. ...
متن کاملSource-to-Source Transformations for Efficient SIMD Code Generation
In the last years, there has been much effort in commercial compilers to generate efficient SIMD instructions-based code sequences from conventional sequential programs. However, the small numbers of compilers that can automatically use these instructions achieve in most cases unsatisfactory results. Therefore, the code often has to be written manually in assembly language or using compiler bui...
متن کاملA Model for Hardware Realization of Kernel Loops
Hardware realization of kernel loops holds the promise of accelerating the overall application performance and is therefore an important part of the synthesis process. In this paper, we consider two important loop optimization techniques, namely loop unrolling and software pipelining that can impact the performance and cost of the synthesized hardware. We propose a novel model that accounts for...
متن کامل